Picture for Yige Yuan

Yige Yuan

R$^2$PO: Decoupling Training Trajectories from Inference Responses for LLM Reasoning

Add code
Jan 17, 2026
Viaarxiv icon

Do We Always Need Query-Level Workflows? Rethinking Agentic Workflow Generation for Multi-Agent Systems

Add code
Jan 16, 2026
Viaarxiv icon

GIFT: Games as Informal Training for Generalizable LLMs

Add code
Jan 09, 2026
Viaarxiv icon

Simple Denoising Diffusion Language Models

Add code
Oct 27, 2025
Viaarxiv icon

From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment

Add code
Jun 14, 2025
Figure 1 for From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
Figure 2 for From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
Figure 3 for From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
Figure 4 for From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
Viaarxiv icon

Incentivizing Strong Reasoning from Weak Supervision

Add code
May 28, 2025
Figure 1 for Incentivizing Strong Reasoning from Weak Supervision
Figure 2 for Incentivizing Strong Reasoning from Weak Supervision
Figure 3 for Incentivizing Strong Reasoning from Weak Supervision
Figure 4 for Incentivizing Strong Reasoning from Weak Supervision
Viaarxiv icon

Inference-time Alignment in Continuous Space

Add code
May 26, 2025
Viaarxiv icon

Incentivizing Reasoning from Weak Supervision

Add code
May 26, 2025
Figure 1 for Incentivizing Reasoning from Weak Supervision
Figure 2 for Incentivizing Reasoning from Weak Supervision
Figure 3 for Incentivizing Reasoning from Weak Supervision
Figure 4 for Incentivizing Reasoning from Weak Supervision
Viaarxiv icon

InfoNCE is a Free Lunch for Semantically guided Graph Contrastive Learning

Add code
May 07, 2025
Figure 1 for InfoNCE is a Free Lunch for Semantically guided Graph Contrastive Learning
Figure 2 for InfoNCE is a Free Lunch for Semantically guided Graph Contrastive Learning
Figure 3 for InfoNCE is a Free Lunch for Semantically guided Graph Contrastive Learning
Figure 4 for InfoNCE is a Free Lunch for Semantically guided Graph Contrastive Learning
Viaarxiv icon

On a Connection Between Imitation Learning and RLHF

Add code
Mar 07, 2025
Viaarxiv icon